Fluctuation-induced traffic congestion in heterogeneous networks 
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In studies of complex heterogeneous networks, particularly of the Internet, significant attention was 
paid to analyzing network failures caused by hardware faults or overload, where the network reaction 
was modeled as rerouting of traffic away from failed or congested elements. Here we model another 
type of the network reaction to congestion - a sharp reduction of the input traffic rate through 
congested routes which occurs on much shorter time scales. We consider the onset of congestion in 
the Internet where local mismatch between demand and capacity results in traffic losses and show 
that it can be described as a phase transition characterized by strong non-Gaussian loss fluctuations 
at a mesoscopic time scale. The fluctuations, caused by noise in input traffic, are exacerbated by 
the heterogeneous nature of the network manifested in a scale-free load distribution. They result 
in the network strongly overreacting to the first signs of congestion by significantly reducing input 
traffic along the communication paths where congestion is utterly negligible. 
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All Internet users are familiar with the feeling of frus- 
tration when their network connection slows down or 
halts. Barring cascading failures [1-7] which can shut 
down parts of the network, such a slowdown is a sign of 
network congestion which happens when the traffic load 
on some network elements exceeds their capacity [2-4, 6] . 
For the Internet congestion can be quantified as a rela- 
tive difference between the rates of sent and delivered 
data packets [2, 8], with excess packets being eventually 
dropped. The first network reaction to a lost packet is a 
significant reduction of a transmission rate at the source 
followed by a slow recovery to the normal rate. When sev- 
eral loss events occur in quick succession, a multiplicative 
reduction drastically suppresses the transmission rate, 
which feels as congestion for the end user. If congestion 
persists for longer, the network eventually reroutes traf- 
fic away from congested links which may overload other 
links triggering a cascade of failures [4] . 

A surprising result of the considerations presented here 
is that transmission rates might be significantly reduced 
when the relative number of lost packets is utterly negligi- 
ble. Such a reduction results from the existence of strong 
fluctuations of data losses along a typical communication 
path at the very onset of congestion. The loss fluctua- 
tions arise at each link at the threshold of its capacity due 
to noise in input traffic. Although such fluctuations exist 
only on shorter, mesoscopic time scales and will die out 
in time, we show that they might trigger an overreaction 
of a typical transport protocol to the first signs of losses. 
Normally the protocol aggressively reduces traffic rates 
along the routes perceived as congested due to multiple 
loss events. The overreaction results from the probability 
of such events on the mesoscopic scale being much higher 
than the product of the single-event probabilities. 

The fluctuations are greatly magnified in heterogenous 



networks characterized by a power law (PL) distribution 
of the link load since congestion on links with high load 
affects a disproportionately large number of communi- 
cation paths, as illustrated in Fig. 1. The link load in 
a network with a homogeneous traffic input distribution 
is proportional to the link betweenness Bi (roughly, the 
number of shortest paths through link i) [9]. Many het- 
erogeneous networks, including the Internet, fall into the 
category of scale-free (SF) networks characterized by the 
PL distribution of node degree [10-14]. Load distribution 
in SF networks also follows a (truncated) PL [14-16], 
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with an almost universal exponent, 2 + 5 ^ 2.0-^2.3. The 
load distribution of the real Internet was found [14, 17, 
18] to be in agreement with the above. 

We focus on a critical regime at the onset of conges- 
tion with a small imbalance, rji = I — TiTi, between the 
average packet arrival rate (load), I/t^, and departure 
rate (capacity), r^, at link i. The nodes in the Internet 
core are routers and the links are output memory buffers 




FIG. 1. The importance of links with high betweenness (load): 
one congested link (3% of all the links here) affects 27% of 
all the shortest paths. A light-shaded sector on each node 
indicates the fraction of congested paths originated on it. 
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(with attached transmission lines). For 77^ > a mem- 
ory buffer eventually becomes full and a newly arrived 
packet is dropped. On average, r]i = {^i{T)) where $i(T') 
is the fraction of packets dropped during an observation 
time window T. Shifting this window in time causes $ 
to fluctuate due to inevitable flow fluctuations [19]. We 
show the loss fluctuations to be crucial for network trans- 
port at certain mesoscopic time scale; for large enough 
T they die out and $j equals r/j. Positive rji plays the 
role of a local congestion parameter: their sum defines 
the network congestion parameter [2, 8]. 

Relative loss, along a typical communication path is 
governed by losses in the comprising links and fluctuates 
due to both the noise in each link and a random choice of 
links in the path. The probability of a randomly picked 
link to be in the path is proportional to its betweenness. 
Hence, in a network with average link betweenness B, a 
small loss along a path with a links is given by 

a 

$ = ^£,$i, £i = Bi/B, (2) 

j=i 

The quenched distribution of the relative load ii is given 
by the truncated PL (1), cut from below by £~ 1. The 
upper cutoff is irrelevant for ^ > 0. 

To describe noise in packet arrivals at link i we as- 
sume, without loss of generality, that the inter-arrival 
time is random, with average n, while packets have a 
fixed length Iq. Arriving packets join the queue in the 
memory buffer. The queue length, Xi{t) (measured in 
lo), performs a random walk bounded by a buffer size c,. 
The probability density, Wi, of diffusion from x' to x over 
time t obeys the Fokker-Planck equation with diffusion 
and advection coefficients Di = l/r^ and Vi = rji/ri and 
the probability-conservation boundary conditions: 

dtWi{x, x'; t) = [-Vidx + A^^] Wi{x, x'\ t) . (3) 

In the critical regime the queue hovers at the boundary. 
A newly arriving packet is dropped every time when the 
queue length Xi{t) overflows reaching the boundary layer, 
c, — 1 < Xi{t) ^ Ci. Thus the fraction of packets lost over 
an observation time T^Ti is 




where Ki = [T/n] » 1 is the number of packets arrived 
over time T, and 6k equals 1 if Xi{t) reached the boundary 
layer at the fc**^ step, or otherwise!. 

To flnd the probability density function (PDF) of 
we begin with the characteristic function, xriQi) = 
(e"^'^') , of the distribution of a cumulative loss, Aj = 
{T/Ti)^i{T). Using Eqs. (3) and (4) we represent xt as 
the sum of time-ordered integrals 
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running over the regions < ti < ■ ■ ■ < tn < T. Here 
Pi = 'Wi{ci,x';t ^ 00) = r]i{l — e~'''°') is the station- 
ary probability density for the queue to be in the bound- 
ary layer and TZi{t) = w{ci, cf, t) is the return probability. 
We rewrite the expression for xt as the integral eqiiation 

XT{qi)-l = iqil^PiT+j^ d^7^i(T-^) [xt(9i)-l]|. (5) 

The inverse Fourier transform from qi to Ai shows the 
PDF of Aj to be the sum TriAi) + AiS{Ai), with T de- 
scribing losses (Ai > 0) and Ai the probability of no losses 
over the time T, with 1 = A, + J^dA J't(A). Solving 
Eq. (5) by the Laplace transform with respect to T gives 

For a perfectly designed network with fully utilized re- 
sources for homogeneous input traffic, Ci and are 
proportional to the relative link load [15], Cj = cii and 
Tf^ = iiT~^, while the imbalance rji is fj-independent. 
Then the congestion threshold 77^ = is reached si- 
multaneously by all links. Naturally, such a complete 
utilization is impossible: design imperfections and local 
variations of demand cause the congestion thresholds to 
spread [6]. We model such a spread (quenched on the 
relevant time scales) as a sharply peaked symmetric dis- 
tribution of rji with criticality width 7^1. Realistic 
regimes are bounded by a <C ^ c, as for > c con- 
gestion thresholds are still reached simultaneously, while 
a network with a7 ^ 1 would be permanently congested. 

The PDF of $j has the same form as that of A^, namely 
AiS{^i) +PT{^i;£i), where its lossy part, Pri^fji), is 
found for fixed £i and r]i by rcscaling Tt, the inverse 
Laplace transform of Eq. (6), as {£i/ipQ)J^T{ii^i/ fo) ■ 
The averaging over r] is straightforward and preserves 
very different shapes of Pt on the mesoscopic time scale, 
iij^ <^t/T = ipQ<^j, and macroscopic one, fo^l- In 
the former case, the congestion spread is so narrow that 
to average over it one replaces r]i by 7 which gives the 
averaged PDF ~ ^a/fl for < $i < tpo/^/Ti followed 
by a decay ~ e~^i^i/'fo for bigger The probability of 
loosing a packet, 1 — Ai ^ iV^i/fo ^ Ij is small. For 
macroscopic times, the Laplace transform in Eq. (6) is ex- 
ponential, corresponding to the averaged PDF given by 
5{^i — rji) for rji > [20]: on this scale PT{^i;£i) repeats 
the shape of the criticality spread for positive rji, while 
free-flow (lossless) links with negative rji give Ai = ^. 

The PDF of losses along path (2) is a convolution of 
PDFs of statistically independent averaged over 

the load distribution (1). It still has the structure 
A5{^) + Pt(*) with A = 11^=1 For (^0 < 7 losses 

in each link in (2) are on the macroscopic time scale, 
light-shaded (green online) in Fig. 2. Then A = 2~" <^ 1 
for a ^ 1 and, by the law of large numbers, Pt{^) is a 
normal distribution with average ~ 07 <C 1 (where 
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FIG. 2. Different loss modes depending on the operation time 
T and criticality width 7. The dark-shaded (red onhne) area 
above the upper bold line, 7"^ — a-^/r/r, represents the 
fluctuation-driven mesoscopic mode; the light-shaded (green 
online) area below the lower bold line, 7~^ = ^JtJt, the self- 
averaging macroscopic mode (microscopic times on the left of 
the parabola, 7"^ = T/r, are not considered). The crossover 
sector between the bold lines is narrow as c a ^ 1 (a is 
the number of links in path (2), c is the memory buffer size 
for link with li — 1). The hatched area shows the region of 
network feedback operations at the congestion onset. 
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FIG. 3. Probability distribution function, Pt($), of relative 
losses in the critical regime at the onset of congestion. The 
average, ($)_^ ~ 07 ^ 1, is time-independent. The main 
figure shows (not to scale) Pt ("l?) when the observation time T 
is on the mesoscopic scale. With increasing T, as tpQ = ^JtJT 
is moving towards the plateau becomes narrower and 

higher, the tail squeezes, and the area of the peak at $ = 
shrinks. After passing through the intermediate time scale, 
{(/Po) becomes larger than ($)^ on the macroscopic time scale, 
where the PDF becomes Gaussian as shown in the inset. 



(. . . )j. stands for the averaging over all the three sources 
of randomness) and width ~ ($)^ j ^/a (inset in Fig. 3). 

The communication path is in a nontrivial fluctuation- 
driven mode only if all the comprising links are in that 
mode, obeying £i <C ^q/j'^- For (5 > in Eq. (1) and 
07 <C V'o ^ V^' '^tiicb defines the mesoscopic time scale 
for the path shown as the dark-shaded area (red online) 
in Fig. 2, one has ^ = (1 — 7/(^0)" ~ 1 — a^y/ipo. The 
PDF has a peak at $ = 0, describing the close-to- 1 prob- 
ability A of not loosing a packet, while the lossy part, 
Pt{^), is dominated by (any) one link along the path: 
P-r($) a (e-^Pri^ /£;£)) The plateau in this rescaled 
PDF is stretched up to t/^o^^^^ with height a'j/ipQ. For 
<f> < (^0 averaging over the distribution (1) leaves the 
plateau intact. For $ ^ </3o the plateau exists only for 
links with load £ > {^/ipo)'^ which becomes the lower 
limit in the averaging over load. When the lower limit is 
much smaller than the upper, ((^0/7)^1 the averaging is 
contributed only by the former resulting in the PL tail 



-2(l+<5) 



7 



(7) 



followed by an exponentially small decay. 

The entire PDF is shown in Fig. 3. The probability 
of losses is small but when losses do occur - the condi- 
tional probability of higher-than-average losses (governed 
by the long plateau and even longer heavy tail) is high: 
the losses are intermittent. The tail (7) is proportional 
to a since it is dominated by losses in any one link along 



the path. On the other hand, this link is shared by a 
large number of paths as illustrated in Fig. 1. 

The tail (7) is irrelevant for ($)^ if (5 > but dominates 
higher moments of intermittent losses if 5 < i. Remark- 
ably, in the SF models of the Internet as well as in direct 
measurements of its link load [15-18] the exponent 2+5 in 
Eq. (1) lies between 2.0 and 2.3, i.e. obeying < 6 < ^. 
The relative values of all higher moments in this mode 
are large: e.g., ^ a^^^-^ipo/a-/y-^ > 1. 

This relatively high magnitude of higher moments 
means that losses occur in groups (intermittency). In- 
deed, the fraction of dropped pairs can be represented 
using Eq. (4) as J2k<j ^kSj. It is approximately 



equal to $f (T) if a typical number of lost packets, rjiKi, 
is large. Similarly, the fraction of n-tuples of dropped 
packets is equal to $"(T) (for n ^ rjiKi). Then the small 
probability of loosing, e.g., three packets scale, {^^}^: is 
much bigger on the mesoscopic time than the probability 
of three independent loss events, ($)^. 

The intermittency is the reason of a strong adverse 
effect of the fluctuations on network feedback. We il- 
lustrate this for the feedback mechanism provided by 
an idealized transmission control protocol (TCP) which 
handles most of data transfer. It works in cycles, each 
consisting of sending a group of W packets from the 
source and receiving acknowledgments of their delivery at 
the destination [21]. The transmission rate equals W/to 
where the cycle duration to is normally the round trip 
time. On establishing a connection, W linearly increases 
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in time until it reaches a steady-state value determined, 
in the absence of losses, by the end-user resources. If a 
loss is detected during the cycle, the protocol halves W. 
If losses occur in a few nearby cycles, W is multiplica- 
tively reduced. As it can increase only additively, such a 
multiple-loss event results in delays noticeable to the end 
user. Indeed, at the onset of congestion a typical round 
trip time is governed by a queue in a single full buffer 
[21] and is of order to = 0.25s. As in free-flow regime 
W > 100, the time of resuming normal rate of service 
could be tens of seconds. 

Hence, it is crucial to know whether the protocol oper- 
ation time scale at the onset of congestion, (pQ(tQ) = T/tQ, 
falls into the mesoscopic mode where the relative proba- 
bility of multiple losses is high. To this end note that the 
memory buffer size Ci of any link is related to its capac- 
ity (maximal sending rate) by the engineering 'rule of 
thumb' [21], Ci = tori, ensuring any full buffer to empty 
during the same time Iq. As TiVi ^ 1 at the congestion 
threshold, we find <^o(^o) ^ l/c We show this region of 
protocol operations as the hatched area in Fig. 2 which 
spreads by many orders in magnitude over 7"^^. 

The criticality width 7 is not directly measurable but 
is bounded. The upper bound in Fig. 2, 7"^ < c, is 
approaching the perfect design (full utilization), as ex- 
plained after Eq. (6). The lower bound, 7"^ :» aW, is 
determined by the condition {^)^ ~ 07 <i; 1/W: if it 
were not fulfilled, at least one packet per cycle would be 
lost resulting in a sharp reduction of the transmission 
rate after a few cycles, i.e. strong congestion. Assuming 
c ^ 10^ for the typical buffer size (in packets) [21], and 
a ~ 10 for the average number of links in an end-to-end 
path across the Internet, [2, 13] we see that aW is close 
to \/c, so that almost the entire hatched area corresponds 
to the mesoscopic mode of intermittent losses. 

On the macroscopic time scale (at the bottom of the 
hatched area) the probability of detecting two loss 

indicators in two consecutive cycles is simply equal to 
($)^. However, for larger 7"^ the ratio ($^)c/(4')c in- 
creases reaching c^~^ /a at the top, where 7 ^ c~^. There 
it varies from 10^ for (5 = ^ to 10^ for (5 — > 0. Such a 
ratio for detecting three loss indicators in three consec- 
utive cycles is even more striking, varying from 10® to 
10^. Conversely, the same multiple-loss indicators may 
correspond to very different average losses 07. And it 
is the time-independent average loss which matters since 
the intermittent fluctuations would die out with time as 
operations move from the dangerous dark-shaded (red 
online) area to the safe light-shaded (green online) one, 
see Fig. 2. Hence, it is of great importance to design pro- 
tocols capable of avoiding the overestimation of nascent 
losses by identifying in which loss mode the network op- 
erates and adjusting accordingly. To this end one needs 
to distinguish between single and multiple packet loss 
events within one cycle - the information which is not 
collected in normal TCP operations. 



The key features of our model might be characteristic 
of complex networks other than the Internet. First, if 
link (or node) operations in a network can be described 
by a finite-capacity model, it will suffer from local con- 
gestion fluctuations at the threshold of capacity due to 
inevitable input noise. Secondly, if such a network is het- 
erogeneous, with a PL load distribution, the fluctuations 
would be greatly enhanced on highly-loaded network el- 
ements. Finally, the fluctuations become really danger- 
ous when they are misinterpreted by a network feedback 
mechanism - the transmission control protocol in the case 
of the Internet. This mechanism is specific for different 
types of network and whether it may trigger fluctuation- 
induced congestion requires ad hoc considerations. 
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